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Abstract 

An NP-hard combinatorial optimization problem IT is said to have an 
approximation threshold if there is some t such that the optimal value of 11 
can be approximated in polynomial time within a ratio of t, and it is NP-hard 
to approximate it within a ratio better than t. We survey some of the known 
approximation threshold results, and discuss the pattern that emerges from 
the known results. 
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1. Introduction 

Given an instance / of a combinatorial optimization problem 11, one wishes to 
find a feasible solution of optimal value to /. For example, in the maximum clique 
problem, the input instance is a graph G(y, E), a feasible solution is a clique (a set 
of vertices S dV such that for all vertices u,v ^ S, edge {u, v) £ E), the value of the 
clique is its size, and the objective is to find a feasible solution of maximum value. 
By introducing an extra parameter A; to a combinatorial maximization problem, 
one can formulate a decision problem of the form "does / have a feasible solution 
of value at least /c?" (or "at most fc", for a minimization problem). In this work 
we consider only combinatorial optimization problems whose decision version is 
NP-complete. Hence solving these problems is NP-hard, informally meaning that 
there is no polynomial time algorithm that is guaranteed to output the optimal 
solution for every input instance (unless P=NP). (For information on the theory of 
NP-completeness, see |H], or essentially any book on computational complexity.) 

One way of efficiently coping with NP-hard combinatorial optimization prob- 
lems is by using polynomial time approximation algorithms. These algorithms pro- 
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duce a feasible solution that is not necessarily optimal. The quality of the approxi- 
mation algorithm is measured by its approximation ratio: the worst case ratio be- 
tween the value of the solution found by the algorithm and the value of the optimal 
solution. Numerous approximation algorithms have been designed for various com- 
binatorial optimization problems, applying a diverse set of algorithmic techniques 
(such as greedy algorithms, dynamic programming, linear programming relaxations, 
applications of the probabilistic method, and more). See for example |14l I21[ |^. 

When one attempts to design an approximation algorithm for a specific prob- 
lem, it is natural to ask whether there are limits to the best approximation ratio 
achievable. The theory of NP-completeness is useful in this context, allowing one to 
prove NP-hardness of approximation results. Namely, for some value of p, achieving 
an approximation ratio better than p is NP-hard. Of particular interest are threshold 
results. A combinatorial optimization problem 11 is said to have an approximation 
threshold at t if there is a polynomial time algorithm that approximates the optimal 
value within a ratio of and a hardness of approximation result that shows that 
achieving approximation ratios better than t is NP-hard. 

The notion of an approximation threshold is often not fully appreciated, so let 
us discuss it in more detail. First, let us make the point that thresholds of approx- 
imation (in the sense above) need not exist at all. One may well imagine that for 
certain problems there is a gradual change in the complexity of achieving various 
approximation ratios: solving the problem exactly is NP-hard, approximating it 
within a ratio of p can be done in polynomial time, and achieving approximation 
ratios between p and 1 is neither in P nor NP-hard, but rather of some intermediate 
complexity. The existence of an approximation threshold says that nearly all ap- 
proximation ratios (except for ratios that differ from the threshold only in low order 
terms) are either NP-hard to achieve, or in P. This is analogous to the well known 
empirical observation that "most" combinatorial optimization problems that people 
study turn out to be either in P or NP-hard, with very few exceptions. But note 
that in the context of approximation ratios, the notion of "most" is well defined. 
The second point that we wish to make is that an NP-hardness of approximation 
result is really a polynomial time algorithm. This algorithm reduces instances of 
3SAT (or of some other NP-complete language) to instances of 11, and an approx- 
imation within a ratio better than p for 11 can be used in order to solve 3SAT. 
Hence the threshold is a meeting point between two polynomial time algorithms: 
the reduction and the approximation algorithm. Here is another way of looking at 
it. For problem tt, we can say that approximation ratio pi reduces to p2 if there is a 
polynomial time algorithm that can approximate instances of tt within ratio pi by 
invoking a subroutine that approximates instances of tt within ratio p2- (Each call 
to the subroutine counts as one time unit.) Two approximation ratios are equiv- 
alent if they are mutually reducible to each other. The existence of a threshold 
of approximation for problem H says that essentially all approximation ratios for 
it fall into two equivalent classes (those above the threshold and those below the 
threshold). A-priori, it is not clear why the number of equivalent classes should be 
two. 

Hardness of approximation results are often (though not always) proved using 
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"PCP techniques" . For more details on these techniques, see for example the survey 
of Arora Pj. We remark that in all cases the hardness results apply also to the 
problem of estimating the optimal value for the respective problem. An estimation 
algorithm is a polynomial time algorithm that outputs an upper bound and a lower 
bound on the value of the optimal solution (without necessarily producing a solution 
whose value falls within this range). The estimation ratio of the algorithm is the 
worst case ratio between the upper bound and the lower bound. An approximation 
algorithm is stronger than an estimation algorithm in a sense that it supports its 
estimate by exhibiting a feasible solution of the same value. Potentially, designing 
estimation algorithms is easier than designing approximation algorithms. See for 
example the remark in Section 12.3.1 

For many combinatorial optimization problems, approximation thresholds are 
known. In Section |3 we survey some of these problems. For each such problem, 
we sketch an efficient algorithm for estimating the optimal value of the solution. 
The estimation ratios of these algorithms (the analog of approximation ratios) meet 
the thresholds of approximation for the respective problems (up to low order ad- 
ditive terms), and hence are best possible (unless P=NP). One would expect these 
estimation algorithms to be the "state of the art" in algorithmic design. However, 
as we shall see, for every problem above there is a core version for which the best 
possible estimation algorithm is elementary: it bases its estimate only on easily 
computable properties of the input instance, such as the number of clauses in a 
formula, without searching for a solution. The core versions that we present often 
have the property that Hastad characterizes as "non-approximable beyond the 
random assignment threshold". In these cases (e.g., max 3SAT), the estimates on 
the value of the optimal solution are derived from analysing the expected value of 
a random solution. 

In contrast, for problems that are known to have a polynomial time approx- 
imation scheme (namely, that can be approximated within ratios arbitrarily close 
to 1), their approximation algorithms do perform an extensive search for a good 
solution, often using dynamic programming. 

The empirical findings are discussed in Section ITI 

2. A survey of some threshold results 

In this section we survey some of the known threshold of approximation results. 
For each problem we show simple lower bounds and upper bounds on the value of 
the optimal solution. Obtaining any better bounds is NP-hard. (There are certain 
exceptions to this. For set cover and domatic number the matching hardness result 
are under the assumption that NP does not have slightly super-polynomial time 
algorithms. The hardness results for clique and chromatic number assume that NP 
does not have randomized algorithms that run in expected polynomial time.). 

Conventions. For a graph, n denotes the number of vertices, m denotes the 
number of edges, 6 denotes the minimum degree, A denotes the maximum degree. 
For a formula, n denotes the number of variables, m denotes the number of clauses. 
For each problem we define its inputs, feasible solutions, value of solutions, and 
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objective. We present the known approximation ratios and hardness results, which 
are given up to low order additive terms. We then present a core version of the 
problem. For the core version, we present an upper bound and a lower bound on 
the value of the optimal solution. Improving over these bounds (in the worst case) 
is NP-hard. In most cases, we give hints to the proofs of our claims. We also cite 
references were full proofs can be found. 



2.1. Max coverage [HI H] 

Input. A collection Si, ... , Sm of sets with IJ™ Si = {!,..., n}. A parameter k. 
Feasible solution. A collection / of fc indices. 
Value. Number of covered items, namely | Uie/ "^il- 
Objective. Maximize. 

Algorithm. Greedy. Iteratively add to I the set containing the maximum number 
of yet uncovered items, breaking ties arbitrarily. 
Approximation ratio. 1 — 1/e. 
Hardness. 1 — 1/e. 

Core, d-regular, r-uniform. Every set is of cardinality d. Every item is in r sets. 

k = n/d. 

Upper bound, n. 

Lower bound. (1 — l/e)n. (Pick k = n/d = m/r sets at random.) 
Hardness of core. (1 — 1/e + e) for every e > 0, when d, r are large enough. 
Remarks. For the special case in which each item belongs to exactly 4 sets, and 
k = m/2, there always is a choice of sets covering at least 15n/16 items. For very 
e > 0, a (1 + e)15/16 approximation ratio is NP-hard jl6| . 



2.2. Min set cover [HI H] 

Input. A collection Si, ... , Sm of sets with IJ™ ^ Si ~ {1, . . . ,n}. 
Feasible solution. A set of indices / such that IJie/ Si = {1, . . . ,n}. 
Value. Number of sets used in the cover, namely, |/|. 
Objective. Minimize. 

Algorithm. Greedy. Iteratively add to / the set containing the maximum number 
of yet uncovered items, breaking ties arbitrarily. 
Approximation ratio. Inn. 
Hardness. Inn. 

Core, d-regular, r-uniform. Every set is of cardinality d. Every item is in r sets. 
Lower bound, n/d. 

Upper bound, (n/d) Inn, up to low order terms. (Include every set in / indepen- 
dently with probability ^^ .) 
Hardness of core. Inn. 

Remarks. The hardness results for set cover in 4 assume that NP does not have 
deterministic algorithms that run in slightly super-polynomial time (namely, time 
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2.3. Domatic number [5] 
Input. A graph. 

Feasible solution. A domatic partition of the graph. That is, a partition of the 
vertices of the graph into disjoint sets, where each set is a dominating set. (A 
dominating set is a set S of vertices that is adjacent to every vertex not in S.) 
Value. Number of dominating sets in the partition. 
Objective. Maximize. 

Algorithm. Let 6 be the minimum degree in the graph. Partition the vertices into 

(1 — e)((5 + 1) / Inn sets at random, where e is arbitrarily small when 6/ Inn is large 

enough. Almost all these sets will be dominating. The sets that are not dominating 

can be unified with the first of the dominating sets to give a domatic partition. The 

algorithm can be derandomized to give (a somewhat unnatural) greedy algorithm. 

Approximation ratio. 1/lnn. 

Hardness. 1 / In n. 

Core. (5/ Inn is large enough. 

Upper bound. (5+1. 

Lower bound. (1 — e)(^ + l)/lnn. 

Hardness of core. (1 + e)/ Inn, for every e > 0. 

Remarks. The hardness results assume that NP does not have deterministic al- 
gorithms that run in time The lower bound can be refined to (1 — 
e)(5 + 1)/ In A, where A is the maximum degree in the graph. This is shown using a 
nonconstructive argument (based on a two phase application of the local lemma of 
Lovasz). It is not known how to find such a domatic partition in polynomial time. 

2.4. A;-center [171 151ITK] 

Input. A metric on a set of n points and a parameter k < n. 
Feasible solution. A set S* of fc of the points. 
Value. Distance between S and point furthest away from S. 
Objective. Minimize. 

Algorithm. Greedy. Starting with the empty set, iteratively add into S the point 
furthest away from S, resolving ties arbitrarily. 
Approximation ratio. 2. 
Hardness. 2. 

Core. The n points are vertices of a graph with minimum degree S, equipped with 
a shortest distance metric, k > n/{5 + 1). 
Lower bound. 1. (Because k < n.) 

Upper bound. 2. (As long as there is a vertex of distance more than 2 from the 
current set S, every step of the greedy algorithm covers at least ^ + 1 vertices.) 
Hardness of core. 2. (Because it is NP-hard to tell if the underlying graph has a 
dominating set of size k.) 

2.5. Min sum set cover ^ 

Input. A collection Si, ... , Sm of sets with IJ™ ^ Si — {1, ... ,n}. 
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Feasible solution. A linear ordering. That is, a one to one mapping / from the 
collection of sets to {1, . . . , m}. 

Value. The average time by which an item is covered. Namely, i min{-j|jg5^.-j /(j). 
Objective. Minimize. 

Algorithm. Greedy. Iteratively pick the set containing the maximum number of yet 
uncovered items, breaking ties arbitrarily. 
Approximation ratio. 4. 
Hardness. 4. 

Core, d-regular, r-uniform. Every set is of cardinality d. Every item is in r sets. 

Lower bound. n/2d = m/2r. (Because every set covers d items.) 

Upper bound. m/{r + 1). (By ordering the sets at random.) 

Hardness of core. 2 — e, for every e > 0, when r is large enough. 

Remarks. Note that the core has a lower threshold of approximation than the 

general case. 



2.6. Min bandwidth [20] 
Input. A graph. 

Feasible solution. A linear arrangement. That is, a one to one mapping / from 
the set of vertices to {1, ... , n}. 

Value. Longest stretch of an edge. Namely, the maximum over all edges {i,j) of 
Objective. Minimize. 

Core. The input graph G is a unit length circular arc graph. Namely, the vertices 
represent arcs of equal length on a circle. Two vertices are connected by an edge 
if their respective arcs overlap. Let lo{G) denote the size of the maximum clique in 
G. (In circular arc graphs, a clique corresponds to a set of mutually intersecting 
arcs, and w(G) can be computed easily.) 
Lower bound. w{G) — 1. 
Upper bound. 2cj(G) — 2. 
Hardness of core. 2. 

Remarks. No threshold of approximation is known for the problem on general 
graphs. 



2.7. Max 3XOR |T2] 

Input. A logical formula with n variables and m clauses. Every clause is the 

exclusive or of three distinct literals. 

Feasible solution. An assignment to the n variables. 

Value. Number of clauses that are satisfied. 

Objective. Maximize. 

Algorithm. Gaussian elimination can be used in order to test if the formula is satis- 
fiable. If the formula is not satisfiable, pick a random assignment to the variables. 
Approximation ratio. 1/2. (In expectation a random assignments satisfies to/2 
clauses, whereas no assignment satisfies more than m clauses.) 
Hardness. 1/2. 
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Core. Same as above. 
Upper bound, m. 
Lower hound, m/2. 
Hardness of core. 1/2. 

2.8. Max SAND [El 

Input. A logical formula with n variables and m clauses. Every clause is the and 
of three literals. 

Feasible solution. An assignment to the n variables. 
Value. Number of clauses that are satisfied. 
Objective. Maximize. 

Algorithm. Based on semidefinite programming. 
Approximation ratio. 1/2. 
Hardness. 1/2. 

Core. Pairwise independent version. Every variable appears at most once in each 
clause. For a pair of variables Xi and Xj, let n,y(0, 0) (nij(0, 1), riij(l,0), nij(l,l), 
respectively) denote the number of clauses in which they both appear negated {i 
negated and j positive, i positive and j negated, both positive, respectively). Then 
for every i and j, nij(0, 0) — nij{0, 1) = nij{l, 0) = 71,^ (1, 1). 

Upper bound. m/A. (Consider the pairs of literals in the satisfied clauses. There 
must be at least three times as many pairs in unsatisfied clauses.) 
Lower bound, to/8. (The expected number of clauses satisfied by a random assign- 
ment.) 

Hardness of core. 1/2. 

Remarks. This somewhat complicated core comes up as an afterthought, by 
analysing the structure of instances that result from the reduction from max 3X0R. 

2.9. Max 3SAT fTH HH] 

Input. A logical formula with n variables and to clauses. Every clause is the or of 
three literals. 

Feasible solution. An assignment to the n variables. 
Value. Number of clauses that are satisfied. 
Objective. Maximize. 

Algorithm. Based on semidefinite programming. 
Approximation ratio. 7/8. 
Hardness. 7/8. 

Core. In every clause, the three literals are distinct. 
Upper bound, m. 

Lower bound. 7m/8. (The expectation for a random assignment.) 
Hardness of core. 7/8. 

Remarks. The approximation ratio of the algorithm of ^\ is determined using 
computer assisted analysis. 
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2.10. Inapproximable problems 

For some problems the best approximation ratios known are of the form n^~^ 
for every e > (where the range of the objective function is between 1 and n). This 
is often interpreted as saying that approximation algorithms are almost helpless 
with respect to these problems. Among these problems we mention finding the 
smallest maximal independent set |10j . max clique jllj . and chromatic number [H]. 
The hardness results in Ej are proved under the assumption that NP does not 
have expected polynomial time randomized algorithms. 

2.11. Thresholds within multiplicative factors 

For some problems, the best approximation ratio is of the form O(n^), for 
some < c < 1, and there is a hardness of approximation result of the form Q{n'^ ), 
where c' can be chosen arbitrarily close to c. We may view these as thresholds of 
approximation up to a low order multiplicative factor. An interesting example of 
this sort is the max disjoint paths problem 9 . 

Input. A directed graph and a set S of pairs of terminals {(si, ti), . . . {sk, tk)}- 
Feasible solution. A collection of edge disjoint paths, each connecting Si to ti for 
some i. 

Value. Number of pairs of terminals from S that are connected by some path in 
the solution. 
Objective. Maximize. 

Algorithm. Greedy. Iteratively add the shortest path that connects some yet un- 
connected pair from S. 

Approximation ratio, m^^/^, where m is the number of arcs in the graph. 
Hardness. m~^/^"*"'^, for every e > 0. 
Core. There is a path from si to ti. 

Upper bound, k. At most all pairs can be connected simultaneously by disjoint 
paths. 

Lower bound. 1. There is a path from si to ti. 

Hardness of core. 1/k. Here k can be chosen as m^/^"*^ for arbitrarily small e. 

3. Discussion 

We summarized the pattern presented in Section ITI Given an NP-hard combi- 
natorial optimization problem that has an approximation threshold, we identify a 
core version of the problem. For the core version we identify certain key parameters. 
Thereafter, an upper bound and a lower bound on the value of the optimal solution 
is expressed by a formula involving only these parameters. Even an algorithm that 
examines the input for polynomial time cannot output better lower bounds or upper 
bounds on the value of the optimal solution (in the worst case, and up to low order 
terms), unless P=NP. 

Note that if we do not restrict what qualifies as being a key parameter, then 
the pattern above can be enforced on essentially any problem with an approxi- 
mation threshold. We can simply take the key parameter to be the output of an 
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approximation algorithm for the same problem. Likewise, if we do not restrict what 
qualifies as a core version, we can simply take the core version to be all those input 
instances on which a certain approximation algorithm outputs a certain value. 

Hence we would like some restrictions on what may qualify as a key parameter, 
or as a core version. One option is to enforce a computational complexity restric- 
tion. Namely, the core should be such that deciding whether an input instance 
belongs to the core is computationally easy, for example, computable in logarithmic 
space. (Note that in this respect, the core that we defined for the max disjoint paths 
problem may be problematic, because testing whether there is a directed path from 
si to <i is complete for nondeterministic logarithmic space.) Likewise;, computing 
the key parameters should be easy. Another option is to enforce structural restric- 
tions. For example, membership in the core should be invariant over renaming of 
variables. Ideally, the notions of "core" and "key parameter" should be defined well 
enough so that we should be able to say that certain classes of inputs do not qualify 
as being a core (e.g., because the class is not closed under certain operations), and 
that certain parameters are not legitimate parameters to be used by an estimation 
algorithm. Also, the definitions should allow in principle the possibility that certain 
problems have approximation thresholds without having a core. 

Hand in hand with suggesting formal definitions to the notion of a core, it 
would be useful to collect more data. Namely, to find more approximation threshold 
results, and to identify plausible core versions to these problems. As the case of 
min bandwidth shows, one may find core versions even for problems for which an 
approximation threshold is not known. 

In this manuscript we mainly addressed the issue of collecting data regard- 
ing approximation thresholds, and describing this data using the notion of a core. 
The issue of uncovering principles that explain why the patterns discussed in this 
manuscript emerge is left to the reader. 
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